NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Can we teach language models to gloss endangered languages?

https://doi.org/10.18653/v1/2024.findings-emnlp.337

Ginn, Michael; Hulden, Mans; Palmer, Alexis (November 2024, Association for Computational Linguistics)

Interlinear glossed text (IGT) is a popular format in language documentation projects, where each morpheme is labeled with a descriptive annotation. Automating the creation of interlinear glossed text would be desirable to reduce annotator effort and maintain consistency across annotated corpora. Prior research has explored a number of statistical and neural methods for automatically producing IGT. As large language models (LLMs) have showed promising results across multilingual tasks, even for rare, endangered languages, it is natural to wonder whether they can be utilized for the task of generating IGT. We explore whether LLMs can be effective at the task of interlinear glossing with in-context learning, without any traditional training. We propose new approaches for selecting examples to provide in-context, observing that targeted selection can significantly improve performance. We find that LLM-based methods beat standard transformer baselines, despite requiring no training at all. These approaches still underperform state-of-the-art supervised systems for the task, but are highly practical for researchers outside of the NLP community, requiring minimal effort to use.
more » « less
Full Text Available
On the Robustness of Neural Models for Full Sentence Transformation

https://doi.org/10.18653/v1/2024.americasnlp-1.19

Ginn, Michael; Marashian, Ali; Shandilya, Bhargav; Post, Claire; Rice, Enora; Vásquez, Juan; Mcgregor, Marie; Buchholz, Matthew; Hulden, Mans; Palmer, Alexis (June 2024, Association for Computational Linguistics)

This paper describes the LECS Lab submission to the AmericasNLP 2024 Shared Task on the Creation of Educational Materials for Indigenous Languages. The task requires transforming a base sentence with regards to one or more linguistic properties (such as negation or tense). We observe that this task shares many similarities with the well-studied task of word-level morphological inflection, and we explore whether the findings from inflection research are applicable to this task. In particular, we experiment with a number of augmentation strategies, finding that they can significantly benefit performance, but that not all augmented data is necessarily beneficial. Furthermore, we find that our character-level neural models show high variability with regards to performance on unseen data, and may not be the best choice when training data is limited.
more » « less
Full Text Available
Findings of the SIGMORPHON 2023 Shared Task on Interlinear Glossing

https://doi.org/10.18653/v1/2023.sigmorphon-1.20

Ginn, Michael; Moeller, Sarah; Palmer, Alexis; Stacey, Anna; Nicolai, Garrett; Hulden, Mans; Silfverberg, Miikka (July 2023, Association for Computational Linguistics)

This paper presents the findings of the SIGMORPHON 2023 Shared Task on Interlinear Glossing. This first iteration of the shared task explores glossing of a set of six typologically diverse languages: Arapaho, Gitksan, Lezgi, Natügu, Tsez and Uspanteko. The shared task encompasses two tracks: a resource-scarce closed track and an open track, where participants are allowed to utilize external data resources. Five teams participated in the shared task. The winning team Tü-CL achieved a 23.99%-point improvement over a baseline RoBERTa system in the closed track and a 17.42%-point improvement in the open track.
more » « less
Full Text Available
On the Complexity and Typology of Inflectional Morphological Systems

https://doi.org/10.1162/tacl_a_00271

Cotterell, Ryan; Kirov, Christo; Hulden, Mans; Eisner, Jason (November 2019, Transactions of the Association for Computational Linguistics)

We quantify the linguistic complexity of different languages’ morphological systems. We verify that there is a statistically significant empirical trade-off between paradigm size and irregularity: A language’s inflectional paradigms may be either large in size or highly irregular, but never both. We define a new measure of paradigm irregularity based on the conditional entropy of the surface realization of a paradigm— how hard it is to jointly predict all the word forms in a paradigm from the lemma. We estimate irregularity by training a predictive model. Our measurements are taken on large morphological paradigms from 36 typologically diverse languages.
more » « less
Full Text Available
UniMorph 3.0: Universal Morphology

McCarthy, Arya D.; Kirov, Christo; Grella, Matteo; Nidhi, Amrit; Xia, Patrick; Gorman, Kyle; Vylomova, Ekaterina; Mielke, Sabrina J.; Nicolai, Garrett; Silfverberg, Miikka; et al (May 2020, Proceedings of the 12th Language Resources and Evaluation Conference)

Full Text Available
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection

https://doi.org/10.18653/v1/2020.sigmorphon-1.1

Vylomova, Ekaterina; White, Jennifer; Salesky, Elizabeth; Mielke, Sabrina J.; Wu, Shijie; Ponti, Edoardo Maria; Hall Maudslay, Rowan; Zmigrod, Ran; Valvoda, Josef; Toldova, Svetlana; et al (July 2020, Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology)

Full Text Available

Search for: All records